Inductive learning models with missing values

نویسندگان

  • Inmaculada Fortes
  • Llanos Mora López
  • R. Morales
  • Francisco Triguero Ruiz
چکیده

In this paper, a new approach to working with missing attribute values in inductive learning algorithms is introduced. Three fundamental issues are studied: the splitting criterion, the allocation of values to missing attribute values, and the prediction of new observations. The formal definition for the splitting criterion is given. This definition takes into account the missing attribute values and generalizes the classical definition. In relation to the second objective, multiple values are assigned to missing attribute values using a decision theory approach. Each of these multiple values will have an associated confidence and error parameter. The error parameter measures how near or how far the value is from the original value of the attribute. After applying a splitting criterion, a decision tree is obtained (from training sets with or without missing attribute values). This decision tree can be used to predict the class of an observation (with or without missing attribute values). Hence, there are four perspectives. The three perspectives with missing attribute values are studied and experimental results are presented. c © 2006 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Process Models with Missing Data

In this paper, we review the task of inductive process modeling, which uses domain knowledge to compose explanatory models of continuous dynamic systems. Next we discuss approaches to learning with missing values in time series, noting that these efforts are typically applied for descriptive modeling tasks that use little background knowledge. We also point out that these methods assume that da...

متن کامل

Investigating the missing data effect on credit scoring rule based models: The case of an Iranian bank

Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...

متن کامل

Performance evaluation of different estimation methods for missing rainfall data

There are numerous methods to estimate missing values of which some are used depending on the data type and regional climatic characteristics. In this research, part of the monthly precipitation data in Sarab synoptic station, east Azerbaijan province, Iran was randomly considered missing values. In order to study the effectiveness of various methods to estimate missing data, by seven classic s...

متن کامل

CLP(BN): Constraint Logic Programming for Probabilistic Knowledge

In Datalog, missing values are represented by Skolem constants. More generally, in logic programming missing values, or existentially­ quantified variables, are represented by terms built from Skolem functors. In an analogy to probabilistic relational models (PRMs), we wish to represent the joint probability distribution over missing values in a database or logic program us­ ing a Bayesian netw...

متن کامل

A Comparative Review of Selection Models in Longitudinal Continuous Response Data with Dropout

Missing values occur in studies of various disciplines such as social sciences, medicine, and economics. The missing mechanism in these studies should be investigated more carefully. In this article, some models, proposed in the literature on longitudinal data with dropout are reviewed and compared. In an applied example it is shown that the selection model of Hausman and Wise (1979, Econometri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Mathematical and Computer Modelling

دوره 44  شماره 

صفحات  -

تاریخ انتشار 2006